Exploiting Web-based Collective Knowledge for Micropost Normalisation

نویسندگان

  • Óscar Muñoz-García
  • Silvia Vázquez
  • Núria Bel
چکیده

The task of normalising user-generated content is a crucial step before analysing social media posts, particularly on Twitter. This paper presents a method for the morphological of tweets by the use of on-line and collectively developed resources, including Wikipedia and a SMS lexicon. The results obtained demonstrate that these resources are a valuable source of knowledge for generating the dictionaries used in the normalisation task.

منابع مشابه

On{line Adaptation of Search via Knowledge Reuse

We have integrated the distributed search of genetic programming based systems with collective memory to form a collective adaptation search method. Such a system sig-niicantly improves search as problem complexity is increased. In collective adaptation , search agents gather knowledge of their environment and deposit it in a central information repository. Process agents are then able to manip...

متن کامل

SACI: Sentiment analysis by collective inspection on social media content

Collective opinions observed in Social Media represent valuable information for a range of applications. On the pursuit of such information, current methods require a prior knowledge of each individual opinion to determine the collective one in a post collection. Differently, we assume that collective analysis could be better performed when exploiting overlaps among distinct posts of the collec...

متن کامل

Augmenting Collective Adaptation with Simple Process Agents

We have integrated the distributed search of genetic programming based systems with collective memory to form a collective adaptation search method. Such a system significantly improves search as problem complexity is increased. However, there is still considerable scope for improvement. In collective adaptation, search agents gather knowledge of their environment and deposit it in a central in...

متن کامل

AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables

We present AIDA, a framework and online tool for entity detection and disambiguation. Given a natural-language text or a Web table, we map mentions of ambiguous names onto canonical entities like people or places, registered in a knowledge base like DBpedia, Freebase, or YAGO. AIDA is a robust framework centred around collective disambiguation exploiting the prominence of entities, similarity b...

متن کامل

Knowledge Extraction in Web Media: At The Frontier of NLP, Machine Learning and Semantics

We identify two main factors that can cause numerous difficulties when developing a generic entity linking system: i) the amount of data currently available on the Web that do not stop to increase and where a large part comes in the form of natural language texts; ii) the velocity at which data is published that may impose to process streams of text in near real-time. Social media platforms suc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013